Goto

Collaborating Authors

 debiasing averaged stochastic gradient descent


Debiasing Averaged Stochastic Gradient Descent to handle missing values

Neural Information Processing Systems

Stochastic gradient algorithm is a key ingredient of many machine learning methods, particularly appropriate for large-scale learning. However, a major caveat of large data is their incompleteness. We propose an averaged stochastic gradient algorithm handling missing values in linear models. This approach has the merit to be free from the need of any data distribution modeling and to account for heterogeneous missing proportion. In both streaming and finite-sample settings, we prove that this algorithm achieves convergence rate of $\mathcal{O}(\frac{1}{n})$ at the iteration $n$, the same as without missing values. We show the convergence behavior and the relevance of the algorithm not only on synthetic data but also on real data sets, including those collected from medical register.


Review for NeurIPS paper: Debiasing Averaged Stochastic Gradient Descent to handle missing values

Neural Information Processing Systems

A key quantity in the analysis is p_j, the probability of missing j-th feature. Is this quantity given in advanced? Under MCAR, it can be easily estimated. But some clarifications are needed. Essentially, this method is just SGD IPW (though it is coordinate-wise IPW not the regular IPW).


Review for NeurIPS paper: Debiasing Averaged Stochastic Gradient Descent to handle missing values

Neural Information Processing Systems

The paper addresses the issue of missing values in SGD. There was agreement that the obtained results are novel but there was no strong advocate for the paper.


Debiasing Averaged Stochastic Gradient Descent to handle missing values

Neural Information Processing Systems

Stochastic gradient algorithm is a key ingredient of many machine learning methods, particularly appropriate for large-scale learning. However, a major caveat of large data is their incompleteness. We propose an averaged stochastic gradient algorithm handling missing values in linear models. This approach has the merit to be free from the need of any data distribution modeling and to account for heterogeneous missing proportion. In both streaming and finite-sample settings, we prove that this algorithm achieves convergence rate of \mathcal{O}(\frac{1}{n}) at the iteration n, the same as without missing values. We show the convergence behavior and the relevance of the algorithm not only on synthetic data but also on real data sets, including those collected from medical register.